feat(node): skip checkpoint sync when recent state data is in the db#403
Conversation
Greptile SummaryAdds a startup fast-path that skips the checkpoint-sync network round-trip when a persisted store is already within
Confidence Score: 4/5Safe to merge; the fast-path only fires under a narrow set of conditions and falls through to the existing checkpoint-sync path in all other cases. The core logic — slot-gap arithmetic, genesis-time validation, and the backend Arc clone/drop sequence — is correct and consistent with existing codebase patterns. The one noteworthy gap is that
|
| Filename | Overview |
|---|---|
| crates/storage/src/store.rs | Adds from_db_state constructor and MAX_RESUMABLE_DB_STATE_AGE constant; logic is sound but the constructor only validates KEY_CONFIG, so a DB with missing KEY_LATEST_FINALIZED would panic in the caller rather than returning None. |
| bin/ethlambda/src/main.rs | Checkpoint-sync fast-path added before network fetch; slot-gap arithmetic using saturating_sub and MILLISECONDS_PER_SLOT is consistent with the existing codebase pattern. |
| crates/storage/src/lib.rs | Re-exports MAX_RESUMABLE_DB_STATE_AGE from store; trivial one-line change, no issues. |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[fetch_initial_state called] --> B{checkpoint_url provided?}
B -- No --> C[Build genesis Store\nfrom_anchor_state]
B -- Yes --> D[Store::from_db_state\nbackend.clone, genesis_time]
D -- None: empty or\ngenesis_time mismatch --> G
D -- Some store --> E[Compute slot gap\ncurrent_slot - finalized_slot]
E --> F{gap <= 450 slots?}
F -- Yes --> H[Return existing\nDB store - skip sync]
F -- No --> I[warn: state stale\ndrop store]
I --> G[checkpoint_sync:\nfetch_finalized_anchor]
G --> J[get_forkchoice_store\nwrites new state to backend]
J --> K[Return new Store]
Prompt To Fix All With AI
Fix the following 1 code review issue. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 1
crates/storage/src/store.rs:527-558
**`from_db_state` only validates `KEY_CONFIG` before returning a `Store`**
The function checks that `KEY_CONFIG` is present and that `genesis_time` matches, then constructs and returns a `Store`. Immediately in the caller (`main.rs`), `store.latest_finalized().slot` is invoked, which calls `get_metadata(KEY_LATEST_FINALIZED)`. That helper unconditionally panics with `"metadata key exists"` if the key is absent. If the DB ever has `KEY_CONFIG` but is missing `KEY_LATEST_FINALIZED` (e.g. a schema change between node versions, a partially-migrated DB, or a crash *between* two separate write batches in future refactors), the node will panic at startup instead of falling through cleanly to checkpoint sync. Consider probing `KEY_LATEST_FINALIZED` inside `from_db_state` and returning `None` when it is absent, consistent with the function's existing "empty or inconsistent → `None`" contract.
Reviews (1): Last reviewed commit: "Add tests" | Re-trigger Greptile
| /// Build a Store from the state already persisted in the storage backend. | ||
| /// | ||
| /// Returns `None` if the backend is empty or its persisted `genesis_time` | ||
| /// doesn't match `expected_genesis_time`. | ||
| pub fn from_db_state( | ||
| backend: Arc<dyn StorageBackend>, | ||
| expected_genesis_time: u64, | ||
| ) -> Option<Self> { | ||
| let persisted_config = { | ||
| let view = backend.begin_read().expect("read view"); | ||
| let bytes = view.get(Table::Metadata, KEY_CONFIG).expect("get config")?; | ||
| ChainConfig::from_ssz_bytes(&bytes).expect("valid config") | ||
| }; | ||
| if persisted_config.genesis_time != expected_genesis_time { | ||
| warn!( | ||
| db_genesis_time = persisted_config.genesis_time, | ||
| expected_genesis_time, | ||
| "Persisted DB has a different genesis_time; treating as empty" | ||
| ); | ||
| return None; | ||
| } | ||
| info!("Loaded store from persisted DB state"); | ||
| Some(Self { | ||
| backend, | ||
| new_payloads: Arc::new(Mutex::new(PayloadBuffer::new(NEW_PAYLOAD_CAP))), | ||
| known_payloads: Arc::new(Mutex::new(PayloadBuffer::new(AGGREGATED_PAYLOAD_CAP))), | ||
| gossip_signatures: Arc::new(Mutex::new(GossipSignatureBuffer::new( | ||
| GOSSIP_SIGNATURE_CAP, | ||
| ))), | ||
| }) | ||
| } | ||
|
|
There was a problem hiding this comment.
from_db_state only validates KEY_CONFIG before returning a Store
The function checks that KEY_CONFIG is present and that genesis_time matches, then constructs and returns a Store. Immediately in the caller (main.rs), store.latest_finalized().slot is invoked, which calls get_metadata(KEY_LATEST_FINALIZED). That helper unconditionally panics with "metadata key exists" if the key is absent. If the DB ever has KEY_CONFIG but is missing KEY_LATEST_FINALIZED (e.g. a schema change between node versions, a partially-migrated DB, or a crash between two separate write batches in future refactors), the node will panic at startup instead of falling through cleanly to checkpoint sync. Consider probing KEY_LATEST_FINALIZED inside from_db_state and returning None when it is absent, consistent with the function's existing "empty or inconsistent → None" contract.
Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/storage/src/store.rs
Line: 527-558
Comment:
**`from_db_state` only validates `KEY_CONFIG` before returning a `Store`**
The function checks that `KEY_CONFIG` is present and that `genesis_time` matches, then constructs and returns a `Store`. Immediately in the caller (`main.rs`), `store.latest_finalized().slot` is invoked, which calls `get_metadata(KEY_LATEST_FINALIZED)`. That helper unconditionally panics with `"metadata key exists"` if the key is absent. If the DB ever has `KEY_CONFIG` but is missing `KEY_LATEST_FINALIZED` (e.g. a schema change between node versions, a partially-migrated DB, or a crash *between* two separate write batches in future refactors), the node will panic at startup instead of falling through cleanly to checkpoint sync. Consider probing `KEY_LATEST_FINALIZED` inside `from_db_state` and returning `None` when it is absent, consistent with the function's existing "empty or inconsistent → `None`" contract.
How can I resolve this? If you propose a fix, please make it concise.
When
--checkpoint-sync-urlis provided, the node currently always downloads a fresh finalized state from the peer, even if a recent state is already on disk. Skip the network round-trip when the persisted state is fresh enough to resume from.This PR introduces a state data freshness threshold -
MAX_RESUMABLE_DB_STATE_AGE = 450slots (~30 min at 4s/slot) - picked conservatively considering the per-block backfill cost. I couldn't identify in the specs a formal weak-subjectivity period or a method of calculating it, so this is a judgement call; happy to take any suggestions on the value or a better approach for it.What Changed
crates/storage/src/store.rs— addedStore::from_db_state(backend, expected_genesis_time), a no-write constructor that wraps an already-initialized backend. ReturnsNoneif the backend is empty or its persistedgenesis_timedoesn't match. Added theMAX_RESUMABLE_DB_STATE_AGE = 450constant (~30 min at 4s/slot).crates/storage/src/lib.rs— re-exportedMAX_RESUMABLE_DB_STATE_AGE.bin/ethlambda/src/main.rs— in the checkpoint-sync branch offetch_initial_state, tryStore::from_db_statefirst. If the persisted finalized slot is withinMAX_RESUMABLE_DB_STATE_AGEof wall-clock, return the resumed store and skip checkpoint sync. Otherwise warn and fall through to the existing sync path.Correctness / Behavior Guarantees
genesis_timematches ANDcurrent_slot - latest_finalized.slot <= MAX_RESUMABLE_DB_STATE_AGE. Everything else remains unchanged.BlocksByRoot-only backfill cost; can be increased onceBlocksByRangelong-range sync (feat(p2p): use BlocksByRange for long-range sync #351) is added.Tests Added / Run
Three unit tests in
crates/storage/src/store.rscovering thefrom_db_statecontract:from_db_state_returns_none_on_empty_backendfrom_db_state_returns_some_on_matching_genesis_timefrom_db_state_returns_none_on_genesis_time_mismatchRelated Issues / PRs
MAX_RESUMABLE_DB_STATE_AGEcan probably be raised towardSTATES_TO_KEEP)✅ Verification Checklist
make fmt— cleanmake lint(clippy with-D warnings) — cleancargo test --workspace --release— all passing